29 research outputs found

    Analysis of Multitarget Activities and Assay Interference Characteristics of Pharmaceutically Relevant Compounds

    Get PDF
    The availability of large amounts of data in public repositories provide a useful source of knowledge in the field of drug discovery. Given the increasing sizes of compound databases and volumes of activity data, computational data mining can be used to study different characteristics and properties of compounds on a large scale. One of the major source of identification of new compounds in early phase of drug discovery is high-throughput screening where millions of compounds are tested against many targets. The screening data provides opportunities to assess activity profiles of compounds. This thesis aims at systematically mining activity data from publicly available sources in order to study the nature of growth of bioactive compounds, analyze multitarget activities and assay interference characteristics of pharmaceutically relevant compounds in context of polypharmacology. In the first study, growth of bioactive compounds against five major target families is monitored over time and compound-scaffold-CSK (cyclic skeleton) hierarchy is applied to investigate structural diversity of active compounds and topological diversity of their scaffolds. The next part of the thesis is based on the analysis of screening data. Initially, extensively assayed compounds are mined from the PubChem database and promiscuity of these compounds is assessed by taking assay frequencies into account. Next, DCM (dark chemical matter) or consistently inactive compounds that have been extensively tested are systematically extracted and their analog relationships with bioactive compounds are determined in order to derive target hypotheses for DCM. Further, PAINS (pan-assay interference compounds) are identified in the extensively tested set of compounds using substructure filters and their assay interference characteristics are studied. Finally, the limitations of PAINS filters are addressed using machine learning models that can distinguish between promiscuous and DCM PAINS. Structural context dependence of PAINS activities is studied by assessing predictions through feature weighting and mapping

    Determining the Degree of Promiscuity of Extensively Assayed Compounds.

    No full text
    In the context of polypharmacology, an emerging concept in drug discovery, promiscuity is rationalized as the ability of compounds to specifically interact with multiple targets. Promiscuity of drugs and bioactive compounds has thus far been analyzed computationally on the basis of activity annotations, without taking assay frequencies or inactivity records into account. Most recent estimates have indicated that bioactive compounds interact on average with only one to two targets, whereas drugs interact with six or more. In this study, we have further extended promiscuity analysis by identifying the most extensively assayed public domain compounds and systematically determining their promiscuity. These compounds were tested in hundreds of assays against hundreds of targets. In our analysis, assay promiscuity was distinguished from target promiscuity and separately analyzed for primary and confirmatory assays. Differences between the degree of assay and target promiscuity were surprisingly small and average and median degrees of target promiscuity of 2.6 to 3.4 and 2.0 were determined, respectively. Thus, target promiscuity remained at a low level even for most extensively tested active compounds. These findings provide further evidence that bioactive compounds are less promiscuous than drugs and have implications for pharmaceutical research. In addition to a possible explanation that drugs are more extensively tested for additional targets, the results would also support a "promiscuity enrichment model" according to which promiscuous compounds might be preferentially selected for therapeutic efficacy during clinical evaluation to ultimately become drugs

    Extracting Compound Profiling Matrices from Screening Data

    No full text

    Promiscuity progression of bioactive compounds over time [v1; ref status: indexed, http://f1000r.es/5cx]

    No full text
    In the context of polypharmacology, compound promiscuity is rationalized as the ability of small molecules to specifically interact with multiple targets. To study promiscuity progression of bioactive compounds in detail, nearly 1 million compounds and more than 5.2 million activity records were analyzed. Compound sets were assembled by applying different data confidence criteria and selecting compounds with activity histories over many years. On the basis of release dates, compounds and activity records were organized on a time course, which ultimately enabled monitoring data growth and promiscuity progression over nearly 40 years, beginning in 1976. Surprisingly low degrees of promiscuity were consistently detected for all compound sets and there were only small increases in promiscuity over time. In fact, most compounds had a constant degree of promiscuity, including compounds with an activity history of 10 or 20 years. Moreover, during periods of massive data growth, beginning in 2007, promiscuity degrees also remained constant or displayed only minor increases, depending on the activity data confidence levels. Considering high-confidence data, bioactive compounds currently interact with 1.5 targets on average, regardless of their origins, and display essentially constant degrees of promiscuity over time. Taken together, our findings provide expectation values for promiscuity progression and magnitudes among bioactive compounds as activity data further grow

    Sparse Topological Pharmacophore Graphs for Interpretable Scaffold Hopping

    No full text
    The aim of scaffold hopping (SH) is to find compounds consisting of different scaffolds from those in already known active compounds, giving an opportunity for unexplored regions of chemical space. We previously demonstrated the usefulness of pharmacophore graphs (PhGs) for this purpose through proof-of-concept virtual screening experiments. PhGs consist of nodes and edges corresponding to pharmacophoric features (PFs) and their topological distances. Although PhGs were effective in SH, they are hard to interpret as they are complete graphs. Herein, we introduce an intuitive representation of a molecule, termed as sparse pharmacophore graphs (SPhG) by keeping the topological distances among PFs as much as possible while reducing the number of edges in the graphs. Several benchmark calculations quantitatively confirmed the sparseness of the graphs and the preservation of topological distances among pharmacophoric points. As proof-of-concept applications, virtual screening (VS) trials for SH were conducted using active and inactive compounds from ChEMBL and PubChem databases for three biological targets: thrombin, tyrosine kinase ABL1, and κ-opioid receptor. The performances of VS were comparable with using fully connected PhGs. Furthermore, highly ranked SPhGs were interpretable for the three biological targets, in particular for thrombin, for which selected SPhGs were in agreement with the structure-based interpretation

    Promiscuity progression of bioactive compounds over time [v2; ref status: indexed, http://f1000r.es/5h4]

    No full text
    In the context of polypharmacology, compound promiscuity is rationalized as the ability of small molecules to specifically interact with multiple targets. To study promiscuity progression of bioactive compounds in detail, nearly 1 million compounds and more than 5.2 million activity records were analyzed. Compound sets were assembled by applying different data confidence criteria and selecting compounds with activity histories over many years. On the basis of publication dates, compounds and activity records were organized on a time course, which ultimately enabled monitoring data growth and promiscuity progression over nearly 40 years, beginning in 1976. Surprisingly low degrees of promiscuity were consistently detected for all compound sets and there were only small increases in promiscuity over time. In fact, most compounds had a constant degree of promiscuity, including compounds with an activity history of 10 or 20 years. Moreover, during periods of massive data growth, beginning in 2007, promiscuity degrees also remained constant or displayed only minor increases, depending on the activity data confidence levels. Considering high-confidence data, bioactive compounds currently interact with 1.5 targets on average, regardless of their origins, and display essentially constant degrees of promiscuity over time. Taken together, our findings provide expectation values for promiscuity progression and magnitudes among bioactive compounds as activity data further grow

    Activity-relevant similarity values for fingerprints and implications for similarity searching [version 2; referees: 3 approved]

    No full text
    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance

    Comparing predictive ability of QSAR/QSPR models using 2D and 3D molecular representations

    No full text
    Quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets

    Activity-relevant similarity values for fingerprints and implications for similarity searching [version 1; referees: 3 approved]

    No full text
    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance
    corecore